Suppose we have a single metric predictor \(x\). Underlying linear propensity of outcome k is denoted \(\lambda_k = \beta_{0,k} + \beta_{1,k}x\). The probability of outcome k is then given by: \[\phi_k = softmax_S(\{\lambda_k\}) = \frac{exp(\lambda_k)}{\sum_{c\in S}exp(\lambda_c)}\] Why exponentiate? Authors explanation: need to have non-negative and preserve order. BB: this is not sufficient explanation! How can we really call this a probability when there is not more theoretical backing because surely this greatly distorts how how larger larger propensities seem compared to smaller ones. Important to note that for any logistic regression we are just making a model of probabilities and these models aren’t really necessarily very accurate; the accuracy of probability estimates is often actually far worse than the accuracy of the class predictions.
\[\text{As } \gamma \rightarrow \infty, \frac{exp(\gamma\lambda_k)}{\sum_{c\in S}exp(\gamma\lambda_c)} \rightarrow \begin{cases} 1 & \text{if } \lambda_k = max(\{\lambda_c\}) \\ 0 & \text{otherwise} \end{cases} \]
With very high gamma, softmax just assigns maximum input. Useful for applications that use the derivative to optimize as it is smooth and differentiable.
There are indeterminancies in the system of equations: we can add a constant \(C_0\) to every \(\beta_{0,k}\) and \(C_1\) to every \(\beta_{1,k}\) and get the same probabilities. Therefore we can set the baseline and slope of one of the categories to arbitrary convenient constants. We will set the constants of one response category, called the reference category \(r\), to zero.
Because of the indeterminacy of the regression coefficents we can interpret the regression coefficientso nly relative to the reference category. The regression coefficients can be conceived in terms of the log odds of each outcome relative to the reference outcome. \(log(\frac{\phi_k}{\phi_r})=\beta_{0,k}+\beta_{1,k}x\)
This is an important property of the softmax function: it implies that the ratio of probabilities of two outcomes is the same regardless of what other possible outcomes are included in the set. Can show this in one line of algebra.
Suppose our preference is 3:2:1 walking:cycling:busing. If suddenly we couldn’t cycle it’s intuitive we would still like to keep a 3:1 ratio of walking to busing.
This doesn’t accurately describe all situations. Suppose we prefer 3:1 walking to bussing but there are two types of bus (red and blue) in a 1:1 ratio, so 6:1 walking:red ratio. if blue bus breaks down we wouldn’t want to keep that 6:1 ratio but we would instead now have a ratio of 3:1 walking:red.
We can divide the set of outcomes into a hierarchy of two-set divisions, then use a logistic to describe the probability of each branch of the two-set divisions. See figure 22.2 for best demonstration.
Remember that each \(\phi\) is a conditional probability so we get structures such as: \(\phi_2 = \phi_{[2]|[2,3,4]} (1-\phi_{[1]|[1,2,3,4]})\).
Results in boundaries that definitely demarcate based on the order of divisions chosen to make up the hierarchy.
In general, conditional logistic regression requires that there is a linear division between two subsets of the outcomes, and then within each of those subsets there is a linear division of smaller subsets, and so on. This sort of division is not required of the softmax regression model.
With many predictors can be virtually impossible to visually ascertain which sort of model is most appropriate; choice of model is driven primarily by theoretical meaningfulness.
Categorical distribution is just like Bernoulli but with several outcomes instead of only two. Outcomes typically labeled as consecutive integers however this does not connote ordering or distance. Each outcomes probability is given by the softmax function.
JAGS implementation much like logistic except need to compute softmax as JAGS does not have it built in (Stan does). dcat distribution in JAGS automatically nomralizes its argument vector, so we needent explicitly prenormalize.
Every different outcome-partition hierarchy yields a different conditional logistic model. Would be difficult to make a diagram of as its lots of layers. Easy to do in JAGS though, we should specify each category of mu explicity in terms of which phi values are needed (probabilities of outcomes in terms of appropriate combinations of conditional probabilities), everything else as normal.`
Interpreting regression coefficients in softmax model is very different to in linear regression; in linear regression a positive coefficient implies that y increases when the predictor increases. Not the case in softmax regression, where a positive regression coefficient is only positive with respect to a particular reference outcome.
1 then 2 then 3 in a line. BB: lets suppose middle of the predictor range is 0.
If outcome 2 is the reference outcome:
If outcome 1 is the reference outcome:
setwd("./DBDA2Eprograms")
source("DBDA2E-utilities.R") # Load definitions of graphics functions etc.
source("Jags-Ynom-XmetMulti-McondLogistic1-Example.R")